Suppose we have a car dealership dataset that has a row for each week of the year and a column for each colour of car indicating how many cars of that colour were sold that week.
DATE | Red Cars | Silver Cars | Black Cars | Blue Cars
What if we then wanted to create an aggregated column which held the total number of cars sold that week?
DATE | Red Cars | Silver Cars | Black Cars | Blue Cars | TOTAL
There are two main ways to handle this:
- Use a calculated field to sum the columns.
- Pivot the data, aggregate across Date & Dealership, and join the totals back to the original date.
The first option is by far the quickest and retains the individual car colour sales values with no extra work. Within a Clean step, simply create a calculated field called “Total Cars Sold” and enter in [Red Cars] + [Silver Cars] + [Black Cars] + [Blue Cars]. Boom, problem solved. You can see a workflow that uses this solution below.
This works fine for a one-off report or a quick answer, however what if we start adding more categories of cars in the future? Each time you’d have to manually update this function. This is where the pivot & aggregate method comes in handy, albeit at the cost of setup time.
To use the pivot & aggregate method:
1: Create a Pivot step.
2: Make sure the pivot is set to “Columns to Rows”.
3: Click where it says to use a wildcard pivot and type “Cars”. This is a case sensitive search, so the “C” needs to be capitalized! Hit enter, and you should see your car columns pivot into rows and a new column appear called [Cars] which contains the values for each car colour.
4: Add an Aggregation step to the right of your Pivot step.
5: Group by [Date] and [Dealership] and aggregate by the sum of [Cars].
6: Add a Join step and join by [Dealership] and [Date] like below. This is so that we get both the total car sales whilst also keeping our original car colour sales breakdowns.
7: Finally, remove any duplicate fields and output the data!
You can see a full workflow using this method below.
The main danger of this method comes from the ambiguity of the wildcard search. If another field gets added which contains “Cars” but isn’t a type of car being sold you could be in trouble!